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(54) Distributed dient-based data caching system and method 

(57) A system and method for enabling data pack- 
age distribution to be performed by a plurality of peer cli- 
ents connected to each other through a network, such 
as a LAN (local area network). Each peer dient can 
obtain data packages from each other or from an exter- 
nal server. However, each peer dient preferably obtains 
data packages from other peer dients, rather than 
obtaining data packages from the external server. 
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1 T 1 . £l Se ? r ent, ° n re ,3teS t0 3 distributed 'Kent-based data caching system and method. Specifically, the 

I P T ert inV6nti0n 6nafc,e data P3Cka9eS t0 be served t0 a dient through a flexible, non^eter- 
mm.stic disputed system of peer clients which cache the data packages, in order to maximize efficiency and speed 
for serving the data package to the client. y p 

SSin h-?"? ^ ° r m ° re com P uters ' a * the Internet or intranets, enable client computers 

2SSn^ P 1f!? eS, "if 35 ^ Cuments ' ima 9 es < messa 9 6S « data Pa^es or other types of data, from remote 
storage media wh,ch are not installed on the client computer itself. Instead, these remote storage media we managed 
and operated through a remote computer, known as a server computer or simply as a "server (in the same vein the 

TJ°!^ eT iS a,S ° 0ften t6rmed ° n,y 3 " dienr) - ne ***n*Qe of such a system is that the client computer can 
potentiany obtain data from any server on the network. The disadvantage of the system is the requirement for sufficient 
banc^icfth on the network to enable data to be transmitted from the server to the client. Furthermore, if the load is not 
evenly distnbuted between servers on the network, one server may become overwhelmed with requests thereby 
decreasing the speed and efficiency of retrieval. Thus, currently many networks cannot provide rapid and efficient data 
retrieval due to the heavy demands placed upon the available bandwidth. 

[°°° 3 ] P ; 0Xy SefVefS 3re ° ften instal,ed to ^serve bandwidth on an Internet connection or on connections to 
other LANs (local area networks). These proxy servers cache frequently accessed data, thereby reducing the load on 
the main server, and distributing demand for bandwidth more evenly across the network. Unfortunately such proxy 
servers are typically expensive to maintain. Furthermore, proxy servers require dedicated computers to be installed and 
configured. Each computer on the LAN has to be separately configured in order to communicate with the proxy server 
Such configuration is deterministic, such that each client must be configured to communicate with each proxy server 
separately. Thus, proxy servers have many drawbacks. 

[0004] A more useful solution would enable Intranets to reap the benefits of the proxy server, without requiring ded- 
icated machines and without requiring any special installation or configuration. Furthermore, such a solution would not 
be deterministic, such that each client could communicate with more than one server according to the load on each 
server, rather than according to the configuration of the client itself. Unfortunately, such a solution is not currently avail- 
abte. 

[00051 Therefore, there is an unmet need for. and it would be highly useful to have; a distributed clierrttased data 
caching system and method which enable data to be stored and retrieved from a plurality of peer clients, or "caching 
entities", yet which does not require any special configuration or installation of separate servers. 
[0006] The present invention is of a distributed client-based data caching system and method, which enable data 
to be served to a client through a flexible, non-deterministic distributed system of caching entities, in order to maximize 
efficiency and speed for serving the document to the client The caching entities are peer clients which serve the data 
to each other, thereby reducing the amount of bandwidth required to obtain data from an external server. 
[0007] According to the present invention, there is provided a method for distributing data packages across a net- 
work, the network featuring an external server for serving at least one data package, the external server being a dedi- 
cated server, the steps of the method being performed by a data processor, the method comprising the steps of: (a) 
providing a plurality of peer clients attached to the network and a list of data packages being stored by each of the plu- 
rality of peer clients, each data package on the list of data packages having an entry, the entry indicating a unique iden- 
tifier for the data package and a location of the data package in at feast one of the plurality of peer clients; (b) examining 
the list of data packages by a first peer client to find an entry for a data package; and (c) if the entry for the data package 
is present on the list of data packages of the first peer client, retrieving the data package from the location at another of 
the plurality of peer clients according to the entry for the data package. 
[0008J Alternatively, the list of data packages is stored on the external server. 

[0009] According to preferred ernbodiments of the present invention, the list of data packages is stored on at least 
the first peer client Preferably, if alternatively the entry for the data package is absent from the list of data packages of 
the first peer client the method further comprises the steps of: (d) sending a request message for the data package by 
the first peer client to at least one other peer client; and (e) if a response message is received by the first Deer client 
from the at least one other peer client retrieving the data package from the at least one other peer client by the first peer 
client. 

[0010] Preferably, the request message and the response message are transmitted to the plurality of peer clients 
by broadcasting. Alternatively, the request message and the response message are transmitted to the plurality of peer 
clients by multicasting. Also alternatively, the request message and the response message are transmitted to the plu- 
rality of peer clients by polling each peer client individually. 

[001 1 ] Also alternatively and preferably, if the response message is not received from the at least one other peer 
client by the first peer client the method further comprises the step of: (f) obtaining the data package by the first peer 
client from the externa) server. Preferably, the method further comprises the step of sending a response message by 
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the first peer client to the at least one other peer client substantially before the first peer client obtains the data package 
from the externaJ server. More preferably, the list of data packages is stored on each of the plurality of peer clients, and 
the method further comprises the steps of: (g), receiving the response message from the first peer client by the at least 
one other peer client; and (h) altering the list of data packages being stored by the at least one other peer client for indi- 
cating the location of the data package according to the response message. 

[001 2] Alternatively, the list of data packages is stored on each of the plurality of peer clients, and the method fur- 
ther comprises the steps of: (g) receiving the response message from the first peer client by the at least one other peer 
client; and (h) altering the list of data packages being stored by the at least one other peer client for indicating the loca- 
tion of the data package according to a probabilistic function. 

{001 3] Preferably, the probabilistic function is performed according to a set of equations; 


Old location Po(x) = l/(gcneration+ 1 ) 


New location Pn(x) = I-l/(generation+l) 


wherein Pn(x) is a probability that the new location is substituted for the old location. Po(x) is a probability that the old 
location is retained, and -generation" indicates how many times the location had been previously changed. 
[001 4] Also preferably, an upper limit is predetermined for a number of the plurality of peer clients served substan- 
tially simultaneously by the at least one other peer dient, such that if a number of the plurality of peer clients served 
substantially simultaneously by the at least one other peer dient is greater than the upper limrt, the method further com- 
prises the step of: (d) sending a busy message from the at least one other peer dient to the first peer client 
[001 51 Preferably, the external server is a Web server, and the plurality of peer clients is a plurality of Web browsers. 
[0016] Also preferably, the external sewer is a BackWeb™ server, and the plurality of peer clients is a plurality of 
BackWeb™ clients. 

[001 7] Preferably, the unique identifier for the data package is an MD5 digest of the data package. 
[001 8] According to still other preferred embodiments of the present invention, the step of retrieving the data pack- 
age is performed according to a protocol based on TCP/IP. Preferably, the protocol is HTTP. Alternatively and preferably 
the protocol is FTP. 

[001 9] Hereinafter, the term "protocol based on TCP/IP" includes any such protocol, induding but not limited to the 
HTTP (hypertext transfer protocol) and FTP (file transfer protocol) protocols. 

[0020] Hereinafter, the term "data package" refers to any discrete, identifiable unit of data, including but not limited 
to documents, images, messages, data packages or any other type of data. 

[0021 ] Hereinafter, the term "computing platform" refers to a particular computer hardware system or to a particular 
software operating system. Examples of such hardware systems include, but are not limited to. personal computers 
(PC). Apple Macintosh ™ computers, mainframes, minicomputers and workstations, which are also non-limiting exam- 
ples of data processors for operating a software application under an operating system. Examples of such software • 
operating systems include, but are not limited to. UNIX, VMS. Linux, MacOS™, DOS, one of the Windows™ operating 
systems by Microsoft Corp. (USA), induding Windows NT ru , Windows 3.x™ (in which "x" is a version number such as 
-Windows 3. 1 ™"), Wlndows95™ and Windows98™ . 

[0022] For the present invention, a software application could be written in a substantially suitable programming 
language, which could easily be selected by one of ordinary skill in the art The programming language chosen should 
be compatible with the operating system according to which the software application is executed. Examples of suitable 
programming languages indude. but are not limited to, C, C++ and Java. 
[0023] Hereinafter, the term "broadcast' may also indude "multicast" as well. 

[0024] The invention is herein described, by way of example only, with reference to the accompanying drawings, 
wherein: 

FIGS. 1 A and 1 B are schematic block diagrams of an exemplary basic system and method according to the present 
invention: 

FIGS. 2A-2E are schematic block diagrams of an exemplary request/response protocol and method according, to 
the present invention: 

FIG. 3 is a schematic block diagram of an exemplary preferred data-flow diagram according to the present inven- 
tion: 


New location = 
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FIG. 4 is a flowchart of a method for operating the system of the present invention with Web browsers' and 
FIGS. 5A and 5B are exemplary request and response messages according to the present invention. ' 

S diStribUt6d SyStem of entities, in order to maximize 

efficiency and speed for serving the data to the client The caching entities we peer clients which serve the data to each 

2Si ^ redUdn9 am ° Um ° f bandW ™ required t0 obtain data f ™ external server 

[0026] TTie system and method of the present invention enable clients to share data Dackaaes amnm th«m«h,~ 

1,16 networktrafhc 15 «« significantly affected, since modem network architectures are weil suited 
for peer-to-peer communications. Most currently operating networks have a star topology, using switching hubs * 

SSL *f 7 ^ rf ent| y avail f le cHent-server software applications known in the art. whenever a client requires a data 
package, the followmg algorrthm .s performed. First, the software application attempts to locate the data package locally 

r^^^r^" 8 ^ Then, ^edata package is not found locally. H»XS?£2 
two retrieves the data package from the appropriate server. w 

/jLT raa JH. e ° Perafon * SySt6m °' thS present invertion ™ intermediate step. For the present 
mT^ne^?^^"^ ^ " ^ * t0 the data P***^ from a peer dient on 

me wcai network nei^pomood before resorting to retrieving the data package from the server 
[00291 Thus, tor me system of the present invention, every client actually functions as a caching proxy Once a client 
W«t a aatapaduge. it queries all the hosts, which are actually peer dients.onmelc^netw^tonLatd^paS- 
ITJJT E^*? ^ haS , the ^ P 3 ^ 96 ' ft * client refrieves *e data package from the external server 
,£rrV a " e,9hb0n " 9 cl,ent ***** re P^ed data package, the requesting client will download 

this data package from the peer client rather than from the externa] server. 

[00301 The pnnoples and operation of the distributed client-based data caching system according to the present 
inventjon may be better understood with reference to the drawings and the accompanying description 

2 9 1 . "««hart «he operation of the system of Figure 1A. Figure 1A shows a system 10 which indudes a 
Plurality of peer clients 12 connected by a local network 14 of some type, for example a LAN, indicated by the heavier 
line m Figure 1 A Two peer dients 12. labeled as "peer client 1" 20 and "peer client 2" 22. are shown for the purposes 
of illustration only and without intending to be limiting in any way. Each peer dient 1 2 is also connected to an external 
server 1$ of some type by an external connection 18. Although only one external server 16 is shown, a plurality of exter- - 
nal servers could also be implemented. Externa) server 16 is a dedicated server, in the sense that this server has a pri- 
mary or at least a substantially significant role as a server for data packages. As shown for the purposes of illustration 
externa connection 18 only connects to local network 14 at one point, although multiple such external connections 
could also be implemented (not shown). In addition, external connection 18 could also optionally connect each peer cli- 
ent 12 directly to server 16 (not shown). 

[0032J The operation of system 10 according to the present invention is illustrated with reference also to Figure i B 
n step 1 , peer client 12. such as peer dient 12 looks for a data package in the local memory or disk cache of that par- 
ticular peer dient 12. If the desired data package is not found on the local disk cache, then in step 2. peer client 1 2 que- 
ries any other peer clientfs) 12 on local network 14 to determine whether any other peer dient 12 has a particular data 
package. For example, peer client 20 could query peer client 22. to determine whether peer dient 22 has the desired 
data package. In step 3a. rf peer dient 22 has the desired data package, then peer dient 20 obtains the data package 
from peer client 22. Alternatively, as shown in step 3b. if peer client 22 does not have the desired data package, then 
peer client 20 obtains the data package from server 16 through external connection 18. Thus, every peer client 12 is 
also potentially a server which is internal to local network 14. and hence could be described as an "internal server" to~ 
distinguish peer client 12 from external server 16. 

[0033) Each peer client 12 could also be described as a "caching entity" and the data stored by each client for serv- 
ing to other peer dients 1 2 as "cached data" or "cached data packages". 

[0034J A number of different possible embodiments of the system of the present invention can be implemented of 
which two illustrative embodiments are shown with reference to the Figures below. Briefly. Figures 2A-2D illustrate an 
exemplary embodiment of the system of the present invention for inplementation with the software application of Back- 
Web (BackWeb Technologies Ltd.. Ramat Gan. Israel) on a local area network (LAN). Figures 4 and 5A-SB illustrate 
an exemplary embodiment of the system of the present invention for implementation with a Web browser software abdi- 
cation on the Internet. 

[00351 Figure 2A shows an exemplary local network 24 which features a plurality of peer dients 1 2 of which three 
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are shown for the purposes of discussion only and without intending to be limiting in any way. For the purposes of dis- 
cussion only, suppose a peer client 26. labeled "A", wishes to obtain four data packages "W. "X\ and T. None of 
these data packages are local to peer client 26. which must therefore obtain these data packages from either another 
peer client 12 as an internal server, or from an external server (not shown). Local area network 24 features two other 
5 peer clients 1 2: peer dient 28, labeled "B". and peer client 30. labeled X\ Peer client 26 must therefore first communi- 
cate a request to peer client 28 and peer client 30 to see if the desired data packages are available at either location, 
and then peer client 26 must obtain these data packages from peer client 28 or peer client 30 if the desired data pack- 
ages are available. 

[0036] Preferably, two protocols are used for communication between peer clients on a local area network (LAN), a 
to data package-exchange protocol and a control protocol. Specif ically. the data package exchange protocol is used to 
transfer data packages between peer clients, once the desired data package has been located, and is described in 
greater detail with respect to Figure 2B below. The control protocol enables each peer dient to effidently build and 
maintain tables which describe the location of available data packages across the local area network by exchanging 
messages. 

'5 [0037] Each peer client maintains two hash-tables which contain information about data package location: a local- 
data packages table and a network-data packages table The local-data packages table is a hash-table of data pack- 
ages which reside on the storage medium or media of the peer dient itself. The network-data packages table is a hash- 
table of data packages which reside on the storage medium or media of other clients on the local network. This table 
contains the local area network address of the peer dient on which each data package is being stored. The size of this 

20 hash-table is preferably limited in order to reduce memory consumption. More preferably, each entry in the table has a 
time-stamp, such that older entries are purged when the size of the table exceeds the upper permissible limit 
[0038] In order to effectively identify the desired data package, preferably each data package has unique identifier 
or "fingerprint" associated with it More preferably, this unique identifier is an MD5 digest of the content of thedata pack- 
age (for a description of the MD5 specification, which is an industry standard and would therefore be obvious to one of 

25 ordinary skill in the art. see "RFC 1321" at http \//ds. internic. net/rfc/rfc 1321.txt). 

[0039] Once any peer dient knows both the unique identifier and the location of the data package on the local net- 
work, that client can then proceed to download the data package. However, the peer client may not know the location 
of the desired data package, in which case the client must follow a control protocol according to the present invention 
in order to determine the location of the desired data package and to enable the dient to build these hash tables with 

30 respect to future attempts to locate a data package. 

[0040] The control protocol is used to provide each client with knowledge about the locations of data packages 
across the local network. In the preferred implementation illustrated with respect to Figures 2A-2D, control messages 
are preferably sent and received as broadcast or multicast packets. Local area networks such as Ethernet networks 
support broadcast or multicast packets such that all peer dients on a local area network receive the broadcast or mul- 

35 ticast packets. Effectively, a single packet can be sent to all peer dients by using broadcast or multicast, thereby reduc- 
ing the amount of traffic on the network required as a result of transmitting the request message (see for example 
Chapter 12, "Broadcasting and Multicasting", of TCP/IP Illustrated Volume , by W. Richard Stevens, Addison- Wesley, 
1994). However, optionally the system of the present invention could poll each peer dient individually with a control 
message for that peer dient, although this is not preferred since such individually addressed messages would consume 

40 excessive amounts of available bandwidth. In such a situation, preferably polling would be restricted to a certain group — 
of peer clients as internal servers, in order to reduce the amount of traffic on the local area network. 
[0041] For the preferred implementation in which broadcast or multicast is used, more preferably, the dedsion to 
select either IP multicast or broadcast is made according to the configuration set by the network administrator, for the 
local area network. IP multicast is preferable in terms of load on the dients of the local network, but may. not be sup- 

•*5 ported on ail platforms (operating systems). More preferably, the TTL or Time to Uve may be configured. TheTTL spec- - 
if ies the number of routers a packet can cross before being dropped. Configuring the TTL enables data package sharing 
to be expanded across subnet boundaries. 

[0042] As shown with respect to Figure 2B. the control protocol of the present invention preferably operates as fol- 
lows. In step 1, peer dienfA" from Figure 2A looks for a data package on the local storage medium or media. In step 
so 2. since the data package was not found locally on the medium or media of peer client "A", peer dient "A" must down- 
load the data package and therefore preferably mutti casts (or alternatively broadcasts) a request messaga A request 
message preferably contains a protocol identifying version number (PVN) for the control protocol of the present inven- 
tion and a list of MD5 digests of the needed data packages, as shown in Figure 2C. 

[0043] Optionally and preferably, if more than one data package is desired, a list of requested data packages is 
55 included in the request message rather than a single MD5 digest, in order to reduce the total number of request mes- 
sages on the network. 

[0044] In step 3. the neighboring clients, shown as peer clients "B" and "C" in Figure 2A, receive this request mes- 
sage and search for the requested data package in their local<lata packages hash-table. A peer dient which does not 
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find the data package locally does not reply, as shown in step 4a Otherwise in sten dh «,« -i- - 
message, preferably after waiting a short random time intervalto 1 ?"* 3 ' eSP °" S<? 

first More preferably, the peer cliit does not ^^^^7^7^^^^"^ 
order to reduce unnecessary traffic on the local area net^^^ or S*J^ r ^^ ew) ^ in 
response message by broadcast or multicast preferably, the peer client distributes the 

first to reply, with a reply message indicating possession of data packages "W" "r^d? ZjL^Z ' ? - £! 
has data oackaaes "X- -y- an H -7- «;„,.= •„, ^ H«w^ges w , a ana Y . Suppose another client C 

S . J^^'^l Peer di !! downioads 409 <** P^ee °r data packages. Inprinciple, according to a relatively 

ZZ!^S f 1 ,nVenti0n - 3t WS 3,396 * e request, "9 cli6 "» either recedes a replyTnddc^rtS 

the data packages from the dierrt that replied; or. if a reolv is not received wrthin => M T/ 

onload these data packages from an terna. serverTfoe S^^S^SjI Z^^lZt 

peer Cent as an internal server, the data package-exchange protocol is used to *2n tht daSS Q e The 2 

mc.net/rfc/rfr9nfis ^ ae ^ September n 1998). ' ava ' ,aWe from "tlBAfe.inter- 

SL riJSf^'h'r"* ""f^ implementeton is em P'°yed. since such a simple implementation may cause 
multiple clients to. fetch the same data packages from the external server simultaneously This MaSaniaS \SZl 
sever* peer clients ne* to download the same data packages at m^^Z^X^Z^ 
able scenano for push clients for which content delivery is triggered by an external server since Zd ^S 

the new Cent request is broadcast such that none of them would be ready to serve these data oadSe? Thus ™Z 
theXS 

!Sf!L JI r !!r Uy ; 016 PrCblem iS * n0fifying other cliems when a ** Oient is downloading the data pack- 

enfoodimem SSZH^T ' TT * ^ * * ? " this TeC 

ernDod.ment. the first client which requires the data package oblains the dala package from the external server Other 

till S i!l P3 * a ! e *° m me 8Xtefna ' Server The embodiment of foe method of the preserrtinven- 

hon is descnbed in greater detail with regaid to Figure 2E. 

wL S ^ 1 ' " ,6 reqUe ^" 9 Cli8flt 39ain tranSmte the request ' a 9 ain P refera "y b * broadcasting or multicasting. 
S,oTe mSine ^TT "f * 3 M period °« «me. in step 2 the Cent transmrts a 

ZTZtZ^l reP,y,n9 10 * ° Wn requeSt indicatin 9 »*» *is dient either has the data package, or in this, 
cas* that the Cent ,s retnevmg me data package. In step 3. the client retrieves the date package from the extern? 

S T'L 0 *" C ' iemS ^ 3nentryin netMOtk *«* Packages hash table, indicating the location of 

Z M 2S I^^ge Se °" n9 ^ PaCka98 " n,US ' ° n,y 3 Sin9le C,i6nt accesses 103 ^ *» 

l^ 651 1" S6nt multip,e da,a packa aes. but a response is received ideating the location of only some 
otthe data packages at a ne.ghbor.ng peer client or clients, the client first obtains these date packages from the neioh- 

SlT °T °: C,iemS NeXt ,he diem men Vansms the response messa 9« * ^ rest oTfoe ite Zages Sd 
proceeds to obtain the renaming data package or data packages from the external server. Thus, the client only obtains 
the data package or date packages from the external server which are not available locally, rather than obtaining all of 
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the data packages from the external server, thereby reducing network traffic. 

[0054] According to preferred embodiments of the present invention, preferably the process of downloading data 
package from peer clients is optimized to reduce the amount of time required for downloadng, the load on each indi- 
vidual client and the overall network traffic. Such optimization is performed as follows. 

[0055] First, preferably the exit degree of each client is bound, such that each client is only able to serve a fixed, 
limited number of other clients simultaneously. More preferably, the default limit is three other clients, for example, or 
some another appropriate number which is preferably configured by the user or by the network administrator. If an addi- 
tional client attempts to download a data package from a dient which is already serving the maximum number of other 
clients will receive a "busy" message. This feature limits the load on each individual client. 

[0056] Also preferably, the present invention is able to optimize the selection of the best client from which the data 
package should be obtained. For example, if client "A" had already downloaded a larger portion of the required data 
package than client "B", transferring the data package from client "A" is more optimal. Such clients are preferentially 
selected to serve data packages, since these clients will be able to serve the data package after a shorter time period 
has elapsed. Such preferential selection occurs by shortening the time period for waiting before these clients respond, 
thereby increasing the likelihood that they will serve the data packages. For this reason, the client preferably calculates 
the random delay before respondng such that the delay is inversely proportional to the percentage of the data package 
which has been already downloaded. In addition, the random delay is preferably proportional to the number of clients 
being served at the moment in order to decrease the Gkelihood of overloading already busy clients. 
[0057] In adoption, according to other preferred embodiments of the present invention, preferably the entries of the 
locations of data packages in the network data packages table are updated according to a probabilistic function. Such 
a function is preferred in order to prevent all of the clients from registering a single client as the server for any particular 
data package, for example. When different clients respond, usually at different times, indicating they have a specific 
data package, the remaining dients listening across the network update the entry for this data package in their network 
data packages table, by adding the IP address, or some other type of address according to the addressing system 
employed by the network, of the dient which can serve the data package to this table. In a simple implementation, the 
clients would store only the last advertised location of each data package, and therefore many or all clients might 
attempt to obtain the data package from a single client as the internal server, thereby overloading that dient 
[0058] To avoid this situation, preferably the following probabilistic algorithm is used to determine the particular di- 
ent address which is stored in the network data packages table. Each time a new client transmits a response message, 
indicating that this client is able to serve a particular data package, the probability that the new IP address of the-new 
client is substituted for the old IP address is calculated according to the following equations: 


25 


40 


New IP address = < 


Old IP address Po(x) = l/(generation+ 1 ) 


New IP address Pn(x) = l-i/(generatoon+l) 


wherein Pn(x) is the probability that anew IP address is substituted for the old IP address, Po(x) is the probability that 
the old IP address is retained, and "generation" is a number indicating how many times this address had been previ- 
ously changed. 

45 [0059] For example, rf client "A" responds indicating it has data package "X", then initially all other peer dients store 
the IP address of dient "A" as the location of data package "X". if dient "B" then broadcasts a response also indicating 
that client "6" has data package "X", then the probability that any one dient now changes the IP address for the location 
of data package "X" is 50%. In other words, about half of the dients should now. point to dient "A" and about naff should 
point to dient "B". 

bo [0060] Such a substantially even distribution of toad across multiple dients should produce data-flow with a tree- 
shaped topology, as shown in Figure 3. rather than a random topology, thus optimizing the average download time and 
the load on the serving dients. 

[0061] Furthermore if any dient requests a particular data package during the period required by dient" A" for 
downloading that package, preferably dient "A" sends a broadcast or multicast message indicating that the package is 
zs in the process of being downloaded. Therefore, preferably only a single client "B" polls client "A" far each data package, 
for example. Other dients preferably automatically receive any responses from that polling action though the broadcast 
or multicast transmission, and thus will not be forced to poll for themselves. 

[0062] The polling (request/response) traffic is optimized since there is usually no need to transmit both a request 
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and a response for each data package needed by each client. Such optimization is possible since each client oreferablv 
Eacc^^^ 

tures a timer, for detection of an aborted transfer or a very slow data package transfer for axamole The timer deter 
TJ^hTTV™'" « a time-out occurs, meeting diJnt P Te3y re^atstT wncTe 
SS'thl ^ T^r^' after 3 p,uraWy °' *• *« P'^rably ceaS to attempt to 

^^^7 f^"""? 3 m8SSage lndieallnfl ^ ^ ^ P 8 *^ 6 is not re ady, aswefl as an indStion of 
fl^^ 96 a " eady d0wntoaded - ^ ^e«ing client continues polling the serving client^ the 
SEX ^ C0 T t& " 9,6 dOWn ' 0ad beC ° meS subs *"«aay ^ower or is othiwise interred orVernJ 
nated for a long penod of time, the requesting client behaves as if a timeout occurred 

« « IS^J^^T 81 Pr !! erTed featUreS 01 1,16 presem inve ntion, substantially automatic detection of peer 
th^ Q rrt' Such | a f° matrc detecti0 " enables ^ch peer client to detect the presence of other peer cliente i 
* **" °" en,S ^ m foUnd ' 406 S y stem 01 *• Mention is disabled. £nce the « ™ 

[0066] Preferably, the amount of bandwidth on the local area network which is consumed by each peer client serv- 

2 S ?T SeS ^f W C ' ientS iS 10 ^ ^ specific host This lim^s prefSy 

ble by the user or by the network administrator. y a a 

S of th™TT' 'IT 16 ' to PrataCt P6er di6n,S ,r0m unauth0 ^ access of local storage media through the 
system of thepresent .nventon. certain secunty features are preferably included. For example, preferably only date 

SSZ " ^ h88h ***** 3re ^ t0 be ,mn8fe,red ** m TdateTacklge, Te 

Wv^Sthf^ ^ ^ ^ in,6nded 10 be served to me ^ dierts - such "■**» users prefera 
d^TrS, ™i SySt6m °* the P T 6rt invention to ^ ' random " d ata packages from the storage media of a peer 
f ITJ ? h f^ 3 !! ^ m ° re Prehrab,y ° nly re,erenced b * *eir unique identifier, such as their 128-bit MD5 digest 
such that a date package .s only able to be downloaded from a client it the intended recipient knows this digest Thus 

Sir^nt 0 3 PaCka9S al °" e iS preferab,y sufficient inform a«°" *> permit retrieval of thedate package'from a 

Z? ! uJf^" 9 1 ° a ^f Sf • n * odim «* <* «* P'«ent invention, the system of the present invention is also appK- 
caole to Web browsers. FTP clients, and other software applications involving diem-server data-transfer. As desafced 
with reference to Figures 4 and 5A-5B. another exemplary embodiment of the presem invention is used for caching Web 

J2 k ^ 1 ° f , RSUre 4> 8 Web br0WSer bein9 opefated by a client ""W re <^esls a specific data package 
Rrstthe Web browser looks at the local cache, as is known to one of ordinary skill inthe art If the data package is found 
in the local cache, then that data package is retneved from the local cache. Otherwise, the Web browser issues a mes- 
sage requesting this date package, preferably by using broadcast or multicast message transmission. The data pack- 
age is preferably uniquely defined by a unique identifier. More preferably, the unique identifier is the URL of the date 
package, or alternatively and preferably a combination of the URL of the date package and timestamp. or by any other 
suitable unique identifier. .. 

45 100701 For oP««"«ation. if more than one data package is required, the Web browser preferably transmits one 
request message containing the list of needed date packages, thereby reducing the total network traffic across the net- 
work. Such a situation may arise if, for example, the Web browser had just parsed an HTML (hypertext mark-up lan- 
guage) document, or Web page, which contains many links to follow. Preferably and optionally, each request message 
contains an identifying "magic number, which may contain the protocol version (PVN). For instance- "V1 0* As shown 

so .n Figure SA. the request message includes the list of URL's or other unique identifiers to identify the data package or 
date packages bang requested, which is similar in function to the list of MD5 digests described previously for request 
messages, and a unique identifier identifying the request message, shown as "REQ". 

[0071] In step 2. other Web browsers across the network listen to detect request messages of this type. These Web 
browsers, which are peer clients for this embodiment of the present invention, receive this request message and check 
5= their own cache for the requested URL If the requested URL is found in the local cache of a Web browser that Web 
browser preferably waits a random interval and then preferably transmits a response message indicating it has the 
required data package (or date packages) . Preferably, the message is broadcast or multicast More preferably that Web 
browser does not reply if another Web browser had replied first A reply message is preferably sent by a particular Weo 
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browser even if the requested URL is still being downloaded by that Web browser. 

[0072] In step 3. if no response to an issued request message is received within a certain amount of time, for exam- 
ple 5 seconds, then the process is preferably timed out In this case, the Web browser preferably no longer attempts to 
obtain the URL from another Web browser, and the URL is obtained from the regular Web server using regular HTTP 

5 protocol. Before starting to download the data package from the regular Web server, the Web browser preferably trans- 
mits a response message indicating that this particular Web browser is downloading the data package. 
[0073] On the other hand, if a response message is received, the Web browser obtains the URL from the other Web 
browser which indicated that it had the URL in the local cache. Preferably. Web browsers across the network record the 
URLs and the address from which the response message originated for future use, such that these Web browsers 

io would be able to download the URL at a future time without first transmitting the request message 

[0074] Once the Web browser is able to locate a data package on a neighboring Web browser, the Web browser 
attempts to download the data package. The downloading process is performed with a suitable data-transfer protocol, 
such as HTTP or FTP. If a time-out or other failure occurs during the processing of data package transfer, the receiving 
Web browser preferably performs substantially the entire procedure more than once. More preferably, the number of 

is permitted attempts to retry the transfer is configurable. If the process fails after these attempts have been performed, 
preferably the Web browser transfers the required data package or data packages from the regular Web server. 
[0075] According to preferred features of this embodiment of the present invention, data package downloading is 
well distributed, such that the Web browsers do not obtain a data package from only a single Web browser, but rather 
obtain the data package from a plurality of Web browsers. Such distribution is maintained as follows. 

20 [0076] First, preferably the number of simultaneous data package transfers from a single Web browser is limited. If 
this number is exceeded, the Web browser transmits a "busy" message to other Web browsers attempting to transfer 
the data package. Next, preferably once a Web browser receives a message giving the location of a particular data 
package, the corresponding entry in the hash table for that data package is not altered every time another response 
message is received pertaining to this data package. The hash table is preferably altered by subsequent messages in 

25 a probabilistic manner, such that the probability that arty particular entry is updated to indicate a new location of a data 
package is equal to 1/(generation+1). where 'generation* counts the number of times a response message was received 
for that data package. 

[0077] For example, if Web browser "A" transmits a response message indicating that data package "X" is on the 
local cache; then initially all of the neighboring Web browsers have an entry in the hash table indicating that Web 

30 browser "A* is the location of data package "X". If Web browser "B" then transmits a response message for data pack- 
age "X". then each Web browser preferably now alters the entry in the hash table to indicate a new location of data 
package "X' with a probability of about fifty percent, such that about fifty percent of the Web browsers now have an 
entry indicating that the data package is available from Web browser "A" and such that about fifty percent of the Web 
browsers now have an entry indicating that the data package is available from Web browser "B". Thus, a good load dis- 

35 tributon can be achieved. 

[0078] The random delay (mentioned in step 2 above) chosen by a browser is proportional to the number of cur- 
rently served browsers, or the number of browsers currently downloading data packages from that browser, and 
inversely proportional to the amount of the data package already downloaded by it This way the browsers more eligible 
to download from are more likely to be chosen by other browsers to serve these data packages. 

40 [0079] While the invention has been described with respect to a limited number of embodiments, it will be appreci- 
ated that many variations, modifications and other applications of the invention may be made. 

Claims 

45 1 . A method for distributing data packages across a network, the network featuring an external server for serving at 
least one data package, the external server being a dedicated server, the steps of the method being performed by 
a data processor, the method comprising the steps of: 

(a) providing a plurality of peer clients attached to the network and providing a list of data packages, said data 
50 packages being stored by each of said plurality of peer clients, each data package of said data packages hav- 
ing an entry in said list, said entry indicating a unique identifier for said data package and a location of said data 
package in at least one of said plurality of peer clients; 

(b) examining said list of data packages by a first peer client to find an entry tor a required data package: and 

(c) if said entry for said data package is present on said list of data packages of said first peer client retrieving 
55 said data package from said location at another of said plurality of peer clients according to said entry for said 

data package. 

2. The method of claim 1 . wherein said list of data packages is stored on at least said first peer client 
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3. The method of daim 2. wherein alternatively said entry (or said data package is absent from said list of data oack- 
ages of said first peer cliem. the method further comprising the steps of: 

(^sending a request message for said data package by said first peer client to at least one other peer client: 

sali'd^^oT^ 9 " t T, 8 ^ ty ^ ' ** PeW cliertfrom ^at least one other peer client retrieving 
said data package from said at least one other peer client by said first peer client 

4. The method of daim 3, the method further comprising the step of: 

(f) altering said list of data packages being stored by at least said first peer client for indicating said location of 
said data package according to said response message. 

5 * JtlT? * *T*- * hef6in response messa 9e is not received from said at least one other peer dient 
by said first peer client, the method further comprises the step of: 


(g) obtaining said data package by said first peer dient from the external 


server. 


6 " ITl^„ 0 ?l Claim 5 ' Of*? °? mpriSin9 ^ 01 sendi "9 a res P° nse m «sage by said first peer client to said 
fJvt ^ Cl,ert substantia, ly before said first peer dient obtains said data package from the external 


? ' IT! h l e ! ,O l 0f daim 6 '. WhWein ^ ' iSt 01 ^Ses is stored on each of said plurality of peer clients, the 
method further comprising the steps of: 

(h) receiving said response message from said first peer client by said at least one other peer client' and 
0) altering sad list of data packages being stored by said at least one other peer client for indicating said loca- 
tion of sad data package according to said response message. 

** mtJ^* daim 5 '- Whefein ^ ' iSt ° f data P 3 ^ 965 is on each of Plurality of peer clients, the 
method further comprising the steps of: 

(h) receiving said response message from said first peer client by said at least one other peer client* and 
0) artenng said list of data packages being stored by said at least one other peer client for indicating said (oca- 
bon of said data package according to a probabilistic function. 

9. The method of claim 1 , wherein an upper limit is predetermined for a number of said plurality of peer clients served 
substantially simultaneously by said at least one other peer client, such that if a number of said plurality of peer cfi- 
ents served substantially simultaneously by said at least one other peer client is greater than said upper limit the 
method further comprises the step of: 

(d) sending a busy message from said at least one other peer client to said first peer client. 

10. The method of daim 1, wherein the external server is a BackWeb™ server, and said plurality of peer clients is a 
Plurality of BackWeb™ clients. 

1 1. A system for distributing data packages across a network according to a list of the data packages, the system com- 
prising: 

(a) an external server for serving at least one data package, said external server being attached to the network; 
and 

(b) a plurality of peer dients attached to the network, the data packages being stored by each of said plurality 
of peer dients. each data package of said data packages having an entry in the list said entry indicating a 
unique identifier for said data package and a location of said data package in at least one of said plurality of 
peer dients. such that each peer dient retrieves a data package according to the list, each peer client first 
attempting to retrieve said data package from another of said plurality of peer dients. and alternatively retriev- 
ing said data package from said external server if said data package is not available from another of said plu- 
rality of peer clients. 
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